Co-Occurrence-Based Error Correction Approach to Word Segmentation

نویسندگان

  • Ekawat Chaowicharat
  • Kanlaya Naruedomkul
چکیده

To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on cooccurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correction Approach to Word Segmentation

A number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algo...

متن کامل

Comparison of state-of-the-art atlas-based bone segmentation approaches from brain MR images for MR-only radiation planning and PET/MR attenuation correction

Introduction: Magnetic Resonance (MR) imaging has emerged as a valuable tool in radiation treatment (RT) planning as well as Positron Emission Tomography (PET) imaging owing to its superior soft-tissue contrast. Due to the fact that there is no direct transformation from voxel intensity in MR images into electron density, itchr('39')s crucial to generate a pseudo-CT (Computed Tomography) image ...

متن کامل

Context-based Speech Recognition Error Detection and Correction

In this paper we present preliminary results of a novel unsupervised approach for highprecision detection and correction of errors in the output of automatic speech recognition systems. We model the likely contexts of all words in an ASR system vocabulary by performing a lexical co-occurrence analysis using a large corpus of output from the speech system. We then identify regions in the data th...

متن کامل

Text Segmentation for Chinese Spell Checking

Chinese spell checking is different from its counterparts for Western languages because Chinese words in texts are not separated by spaces. Chinese spell checking in this article refers to how to identify the misuse of characters in text composition. In other words, it is error correction at the word level rather than at the character level. Before Chinese sentences are spell checked, the text ...

متن کامل

The analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry

Background and Aims: Dentistry is an important profession ensuring the health of body and soul, and has a special place in the scientific productions of medical disciplines. The purpose of this study was to analyze the co-citation and word co-occurrence of Iranian research papers in the field of dentistry based on indexed documents in Web of Science from 2014 to 2018. Materials and Methods:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011